对抗性示例的可转移性是应用这种攻击基于真实环境中的深度学习(DL)的多媒体取证(MMF)技术应用这种攻击的关键问题。事实上,对攻击者没有全面了解待攻击系统的情况,对侵犯柜台取证攻击的部署也会开辟道路。一些初步作品表明,对基于CNN的图像取证检测器的对抗示例通常是不可转移的,至少当采用最受欢迎的库中实现的攻击的基本版本时。在本文中,我们介绍了一般的策略,以提高攻击的强度,并在这种强度变化时评估其可转化性。我们通过实验表明,通过这种方式,攻击可转让性可以在很大程度上增加,以牺牲更大的变形。我们的研究证实了甚至在多媒体取证方案中存在对抗性示例所带来的安全威胁,因此要求新的防御策略来提高基于DL的MMF技术的安全性。
translated by 谷歌翻译
Despite excellent performance in image generation, Generative Adversarial Networks (GANs) are notorious for its requirements of enormous storage and intensive computation. As an awesome ''performance maker'', knowledge distillation is demonstrated to be particularly efficacious in exploring low-priced GANs. In this paper, we investigate the irreplaceability of teacher discriminator and present an inventive discriminator-cooperated distillation, abbreviated as DCD, towards refining better feature maps from the generator. In contrast to conventional pixel-to-pixel match methods in feature map distillation, our DCD utilizes teacher discriminator as a transformation to drive intermediate results of the student generator to be perceptually close to corresponding outputs of the teacher generator. Furthermore, in order to mitigate mode collapse in GAN compression, we construct a collaborative adversarial training paradigm where the teacher discriminator is from scratch established to co-train with student generator in company with our DCD. Our DCD shows superior results compared with existing GAN compression methods. For instance, after reducing over 40x MACs and 80x parameters of CycleGAN, we well decrease FID metric from 61.53 to 48.24 while the current SoTA method merely has 51.92. This work's source code has been made accessible at https://github.com/poopit/DCD-official.
translated by 谷歌翻译
CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed images and the corresponding labels harms its efficacy. Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels, but inevitably introduce heavy training overhead or require extra information, undermining ease of use. To this end, we propose an efficient and effective Self-Motivated image Mixing method (SMMix), which motivates both image and label enhancement by the model under training itself. Specifically, we propose a max-min attention region mixing approach that enriches the attention-focused objects in the mixed images. Then, we introduce a fine-grained label assignment technique that co-trains the output tokens of mixed images with fine-grained supervision. Moreover, we devise a novel feature consistency constraint to align features from mixed and unmixed images. Due to the subtle designs of the self-motivated paradigm, our SMMix is significant in its smaller training overhead and better performance than other CutMix variants. In particular, SMMix improves the accuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% on ImageNet-1k. The generalization capability of our method is also demonstrated on downstream tasks and out-of-distribution datasets. Code of this project is available at https://github.com/ChenMnZ/SMMix.
translated by 谷歌翻译
This paper proposes a content relationship distillation (CRD) to tackle the over-parameterized generative adversarial networks (GANs) for the serviceability in cutting-edge devices. In contrast to traditional instance-level distillation, we design a novel GAN compression oriented knowledge by slicing the contents of teacher outputs into multiple fine-grained granularities, such as row/column strips (global information) and image patches (local information), modeling the relationships among them, such as pairwise distance and triplet-wise angle, and encouraging the student to capture these relationships within its output contents. Built upon our proposed content-level distillation, we also deploy an online teacher discriminator, which keeps updating when co-trained with the teacher generator and keeps freezing when co-trained with the student generator for better adversarial training. We perform extensive experiments on three benchmark datasets, the results of which show that our CRD reaches the most complexity reduction on GANs while obtaining the best performance in comparison with existing methods. For example, we reduce MACs of CycleGAN by around 40x and parameters by over 80x, meanwhile, 46.61 FIDs are obtained compared with these of 51.92 for the current state-of-the-art. Code of this project is available at https://github.com/TheKernelZ/CRD.
translated by 谷歌翻译
Most shadow removal methods rely on the invasion of training images associated with laborious and lavish shadow region annotations, leading to the increasing popularity of shadow image synthesis. However, the poor performance also stems from these synthesized images since they are often shadow-inauthentic and details-impaired. In this paper, we present a novel generation framework, referred to as HQSS, for high-quality pseudo shadow image synthesis. The given image is first decoupled into a shadow region identity and a non-shadow region identity. HQSS employs a shadow feature encoder and a generator to synthesize pseudo images. Specifically, the encoder extracts the shadow feature of a region identity which is then paired with another region identity to serve as the generator input to synthesize a pseudo image. The pseudo image is expected to have the shadow feature as its input shadow feature and as well as a real-like image detail as its input region identity. To fulfill this goal, we design three learning objectives. When the shadow feature and input region identity are from the same region identity, we propose a self-reconstruction loss that guides the generator to reconstruct an identical pseudo image as its input. When the shadow feature and input region identity are from different identities, we introduce an inter-reconstruction loss and a cycle-reconstruction loss to make sure that shadow characteristics and detail information can be well retained in the synthesized images. Our HQSS is observed to outperform the state-of-the-art methods on ISTD dataset, Video Shadow Removal dataset, and SRD dataset. The code is available at https://github.com/zysxmu/HQSS.
translated by 谷歌翻译
Nowadays, Multi-purpose Messaging Mobile App (MMMA) has become increasingly prevalent. MMMAs attract fraudsters and some cybercriminals provide support for frauds via black market accounts (BMAs). Compared to fraudsters, BMAs are not directly involved in frauds and are more difficult to detect. This paper illustrates our BMA detection system SGRL (Self-supervised Graph Representation Learning) used in WeChat, a representative MMMA with over a billion users. We tailor Graph Neural Network and Graph Self-supervised Learning in SGRL for BMA detection. The workflow of SGRL contains a pretraining phase that utilizes structural information, node attribute information and available human knowledge, and a lightweight detection phase. In offline experiments, SGRL outperforms state-of-the-art methods by 16.06%-58.17% on offline evaluation measures. We deploy SGRL in the online environment to detect BMAs on the billion-scale WeChat graph, and it exceeds the alternative by 7.27% on the online evaluation measure. In conclusion, SGRL can alleviate label reliance, generalize well to unseen data, and effectively detect BMAs in WeChat.
translated by 谷歌翻译
在统一功能对应模型中建模稀疏和致密的图像匹配最近引起了研究的兴趣。但是,现有的努力主要集中于提高匹配的准确性,同时忽略其效率,这对于现实世界的应用至关重要。在本文中,我们提出了一种有效的结构,该结构以粗到精细的方式找到对应关系,从而显着提高了功能对应模型的效率。为了实现这一目标,多个变压器块是阶段范围连接的,以逐步完善共享的多尺度特征提取网络上的预测坐标。给定一对图像和任意查询坐标,所有对应关系均在单个进纸传球内预测。我们进一步提出了一种自适应查询聚类策略和基于不确定性的离群检测模块,以与提出的框架合作,以进行更快,更好的预测。对各种稀疏和密集的匹配任务进行的实验证明了我们方法在效率和有效性上对现有的最新作品的优势。
translated by 谷歌翻译
蒙面自动编码器已成为自我监督的视觉表示学习的流行培训范例。这些模型随机掩盖了输入的一部分,并根据目标表示形式重建蒙版部分。在本文中,我们首先表明,对目标表示的仔细选择对于学习良好表示形式不必要,因为不同的目标倾向于得出相似的模型。在这一观察结果的驱动下,我们提出了一个多阶段掩盖的蒸馏管道,并使用随机初始化的模型作为教师,使我们能够有效地训练高容量模型,而无需仔细设计目标表示形式。有趣的是,我们进一步探索了能力较大的教师,获得具有出色转移能力的蒸馏学生。在分类,转移学习,对象检测和语义分割的不同任务上,使用自举的教师(DBOT)执行掩盖知识蒸馏的建议方法优于先前的自我监督方法,而不是非平凡的边缘。我们希望我们的发现以及拟议的方法能够激励人们重新考虑目标表征在预训练的蒙面自动编码器中的作用。
translated by 谷歌翻译
在过去的十年中,神经网络在各种各样的反问题中取得了显着的成功,从医学成像到地震分析等学科中的采用促进了他们的收养。但是,这种反问题的高维度同时使当前理论预测,网络应在问题的维度上成倍扩展,无法解释为什么在这些设置中使用的看似很小的网络在实践中也可以正常工作。为了减少理论和实践之间的差距,在本文中提供了一种在具有低复杂性结构的高维置的神经网络近似Lipschitz函数所需的复杂性的一般方法。该方法基于这样的观察,即在\ mathbb {r}^in \ mathbb {r}^{d \ times d} $ in \ mathbb {a} \ in \ mathbb {a} \ in \ mathcal集合$ \ mathcal {S } \ subset \ mathbb {r}^d $中的低维立方体$ [ - m,m]^d $意味着对于任何Lipschitz函数$ f:\ mathcal {s} \ to \ mathbb {r}^p $ ,存在lipschitz函数$ g:[-m,m]^d \ to \ mathbb {r}^p $,使得$ g(\ mathbf {a} \ mathbf {x})= f(\ mathbf {x })$用于所有$ \ mathbf {x} \ in \ mathcal {s} $。因此,如果一个人具有一个近似$ g的神经网络:[-m,m]^d \ to \ mathbb {r}^p $,则可以添加一个图层,以实现JL嵌入$ \ mathbf {A a} $要获得一个近似于$ f的神经网络:\ mathcal {s} \ to \ mathbb {r}^p $。通过将JL嵌入结果与神经网络近似Lipschitz函数的近似结果配对,然后获得了一个结果,这些结果绑定了神经网络所需的复杂性,以近似Lipschitz在高尺寸集合上的功能。最终结果是一个一般的理论框架,然后可以用它来更好地解释比当前理论所允许的更广泛的逆问题中较小的网络的经验成功。
translated by 谷歌翻译
本文着重于当前过度参数化的阴影去除模型的局限性。我们提出了一个新颖的轻型深神经网络,该网络在实验室色彩空间中处理阴影图像。提出的称为“实验室网络”的网络是由以下三个观察结果激励的:首先,实验室颜色空间可以很好地分离亮度信息和颜色属性。其次,顺序堆叠的卷积层无法完全使用来自不同接受场的特征。第三,非阴影区域是重要的先验知识,可以减少阴影和非阴影区域之间的剧烈差异。因此,我们通过涉及两个分支结构的结构来设计实验室网络:L和AB分支。因此,与阴影相关的亮度信息可以很好地处理在L分支中,而颜色属性则很好地保留在AB分支中。此外,每个分支由几个基本块,局部空间注意模块(LSA)和卷积过滤器组成。每个基本块由多个平行的扩张扩张率的扩张卷积组成,以接收不同的接收场,这些接收场具有不同的网络宽度,以节省模型参数和计算成本。然后,构建了增强的通道注意模块(ECA),以从不同的接受场聚集特征,以更好地去除阴影。最后,进一步开发了LSA模块,以充分利用非阴影区域中的先前信息来清洁阴影区域。我们在ISTD和SRD数据集上执行广泛的实验。实验结果表明,我们的实验室网络井胜过最先进的方法。同样,我们的模型参数和计算成本降低了几个数量级。我们的代码可在https://github.com/ngrxmu/lab-net上找到。
translated by 谷歌翻译